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(57) Abstract 



A telecommunications network service (figure 1) overcomes the 
annoying effects of transmitted noise by a signal processing which filters 
out the noise using interactive estimations of a linear predictive coding 
speech model. The speech model filter uses an accurate updated estimate 
of the current noise power spectral density, based upon incoming signal 
frame samples which are determined by a voice activity detector (25) 
to be noise-only frames. A novel method of calculating the incoming 
signal using the linear predictive coding model provides for making 
intraframe iterations of the present frame based upon a selected number 
of recent past frames and up to two future frames (figure 3 and figure 
10). The processing is effective notwithstanding that the noise signal is 
not ascertainable from its source. 
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TRANSMITTED NOISE REDUCTION 
IN COMMUNICATIONS SYSTEMS 



FIELD OF THE INVENTION 

This invention relates to enhancing the quality of speech in a noisy 
telecommunications channel or network and, particularly, to apparatus which 
enhances the speech by continuously removing noise content through a novel use of 
linear predictive coding. 



BACKGROUND OF THE INVENTION 

In all forms of voice communications systems, noise from a variety of 
causes can interfere with the user's communications. Corrupting noise can occur 
with speech at the input of a system, in the transmission path(s), and at the receiving 
end. The presence of noise is annoying or distracting to users, can adversely affect 
speech quality, and can reduce the performance of speech coding and speech 
recognition apparatus. 

Speech enhancement technology is important to cellular radio telephone 
systems which are subjected to car noise and channel noise, to pay phones located in 
noisy environments, to long-distance communications over noisy radio links or other 
poor paths and connections, to teleconferencing systems with noise at the speech 
source, and air-ground communication systems where loud cockpit noise corrupts 
20 pilot speech and is both wearing and dangerous. Further, as in the case of a speech 
recognition system for automatic dialing, recognition accuracy can deteriorate in the 
noisy environment if the recognizer algorithm is based on a statistical model of clean 
speech. 

Noise in the transmission path is particularly difficult to overcome, one 
25 reason being that the noise signal is not ascertainable from its source. Therefore, 
suppressing it cannot be accomplished by generating an "error" signal from a direct 
measurement of the noise and then cancelling out the error signal by phase inversion. 

Various approaches to enhancing a noisy speech signal when the noise 
component is not directly observable have been attempted. A review of these 
30 techniques is found in "Enhancement and Bandwidth Compresion of Noisy Speech." 
by J. S. Lim and A. V. Oppenheim, Proceedings of the IEEE. Vol. 67. No. 12, 
December 1979, Section V, pp 1586-1604. These include spectral subtraction of the 
estimated noise amplitude spectrum from the whole spectrum computed for the 
available noisy signal, and an iterative model-based filter proposed by Lim and 
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Oppenheira which attempts to find the best all-pole model of the speech component 
given the total noisy signal and an estimate of the noise power spectrum. The 
model-based approach was used by J. H. L. Hansen, in "constrained Iterative Speech 
Enhancement with Appliction to Speech Recognition," by J. H. L. Hansen and M. A. 
5 Clements, IEEE Transactions On Signal Processing, Vol. 39, No. 4, April 1991, pp. 
795-805, to develop a non-real-time speech smoother, where additional constraints 
across time were imposed on the speech model during the Lim-Oppenheim iterations 
to limit the model to changes characteristic of speech. 

The effects of the earlier methods in the Um/Oppenheim reference are 

10 to improve the signal-to-noise ratio after the processing, but with poor speech quality 
improvement due to the introduction of non-stationary noise in the filtered outputs. 
Even very low level non-stationary noise can be objectionable to human hearing. 
The advantage of smoothing across time frames in Hansen's non-real-time smoother 
is to further reduce the level of the non-stationary noise that remains. Hansen's 

15 smoothing approach provides considerable speech quality enhancement compared 
with the methods in Lim/Oppenheim reference, but this technique cannot be 
operated in real-time since it processes all data, past and future, at each time frame. 
Then the improvement cannot work effectively in a telecommunications 
environment One of the improvements described below is to alter the Hansen 

20 smoother to function as a filter that is compatible with this environment 

SUMMARY OF THE INVENTION 

The invention is a signal processing method for a communication 
network, which filters out noise using iterative estimation of the LPC speech model 
with the addition of real-time operation continuous estimation of the noise power 
25 spectrum, modification of the signal refiltered each iteration, and time constraints on 
the number of poles and their movements across time frames. The noise-corrupted 
input speech signal is applied to a special iterated linear Wiener Filter the purpose of 
which is to output in real-time an estimate of the speech which then is transmitted 
into the network. 

30 The filter requires an accurate estimate of the current noise power 

spectral density function. This is obtained from spectral estimation of the input in 
noise gaps that are typical in speech. The detection of these noise-only frames is 
accomplished by a Voice Activity Detector (VAD). When noise-only is detected in 
the VAD, the filter output is attenuated so that the full noise power is not propagated 

35 onto the network. 
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When speech plus noise is detected in the time frame under 
consideration by the filter, an estimate is made as to whether the speech is voiced or 
unvoiced. The order of the LPC model assumed in the iterated filter is modified 
according to the speech type detected. As a rule, the LPC model order is 
M=Fs +(4 or 5) if voiced speech and M=Fs if unvoiced speech in the time frame, 
where Fs is the speech bandwidth in KHz. This dynamic adaptation of model order 
is used to suppress stray model poles that can produce time-dependent modulated 
tonelike noise in the filtered speech. 

In accordance with another aspect of the invention, a tracking of 
changes in the noise spectrum is provided by updating with new noise-only frames to 
a degree that depends on a "distance" between the new and old noise spectrum 
estimates. Parameters may be set on the minimum number of contiguous new noise 
frames that must be detected before a new noise spectrum update is estimated and on 
the weight the new noise spectrum update is given. 

These and further inventive improvements to the art of using iterative 
estimation of a filter that incorporates an adaptive speech model and noise spectral 
estimation with updates to suppress noise of the type which cannot be directly 
measured are hereinafter detailed in the description to follow of a specific novel 
embodiment of the invention used in a telecommunication network. 

DESCRIPTION OF THE DRAWING 

FIG. 1 is a diagram of an illustrative telecommunications network 
containing the invention; 

FIG. 1 A is a signal processing resource; 

FIG. 2 is a diagram of a smoothing operation practiced in the invention; 

FIG. 3 is a flowchart showing the framework for speech enhancement; 

FIG. 4 is a diagram of apparatus which generates the iteration sequence 
for constrained speech enhancement; 

FIG. 5 is a diagram depicting the interframe smoothing operation for 
LPC roots of the speech model; and the intraframe LPC autocorrelation matrix 
relaxation from iteration to iteration; 

FIG. 6a is a diagram showing a method for updating each iteration of the 
current frame; 

FIG. 6b is a diagram showing the improved method used for updating 
each iteration fo the current frame; 



WO 95/15550 



PCT/US94/12998 



-4- 

FIG. 7 is a table of smoothing weights for the LSP position roots to 
smooth across seven speech frames around the current frame; 

FIGS. 8 and 9 are signal traces showing aspects of the noise estimator; 

and 

5 FIG. 10 is a description of the steps used to update the required noise 

spectrum used in the Wiener Filter. 

DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT 

The invention is essentially an enhancement process for filtering in- 
channel speech-plus-noise when no separate noise reference is available and which 

10 operates in real time. The invention will be described in connection with a 

telecommunications network, although it is understood that the principles of the 
invention are applicable to many situations where noise on an electronic speech 
transmission medium must be reduced. An exemplary telecommunications network 
is shown in FIG. 1, consisting of a remotely located switch 10 to which numerous 

15 communications terminals such as telephone 1 1 are connected over local lines such 
as 12 which might be twisted pair. Outgoing channels such as path 13 emanate from 
remote office 10. The path 13 may cross over an international border 14. The path 
13 continues to a U. S. based central office 15 with a switch 16 which might be a No. 
4ESS switch serving numerous incoming paths denoted 17 including path 13. 

20 Switch 16 sets up an internal path such as path 18 which, in the 

example, links an incoming call from channel 13 to an eventual outgoing 
transmission channel 19, which is one of a group 19 of outgoing channels. The 
incoming call from channel 13 is assumed to contain noise generated in any of the 
segments 10, 11, 12, 13 of the linkage; the noise source, therefore, cannot be directly 

25 measured. 

In accordance with the invention, a determination is made in logic unit 
20 whether noise above a certain predetermined threshold is present in the switch 
output from channel 13. Logic unit 20 also determines whether the call is voice, by 
ruling out fax, modem and other possibilities. Further, logic unit 20 determines 
30 whether the originating number is a customer of the transmitted noise reduction 
service. If logic unit 20 makes all three determinations, the call is routed to 
processing unit 21 by switch 22; otherwise the call is passed directly through to 
channel 19. While only one processing unit 21 is shown, all of the channels 
outgoing from switch 16 are connectable to other processors 21 (not shown). 
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The incoming signal from noisy channel 13 may be processed to 
advantage by an analog filter (not shown) which has a frequency response restricted 
to that of the baseband telephone signal. 

In the system discussed here, the noisy speech presented to processor 21 
is digitized at an 8 KHz rate, and the time series are processed in frames. The frame 
size used is 160 samples (20 msec.) and a 50% overlap is imposed on these blocks to 
insure continuity of the reconstructed filtered speech. 

Referring now to FIG. 1 A. processor 21 consists of a Wiener Filter 
where the signal spectrum for this filter is estimated by assuming an all-pole LPC 
model and iterating each frame to get the unknown parameters. This is filter 23 to 
which the noisy call is routed. The call also is routed via bypass 24 to the Voice 
Activity Detector (VAD) 25. which continuously detects noise or speech-plus-noise 
frames and determines if a speech frame is voiced or unvoiced. The required noise 
spectrum to be used in the Wiener Filter is estimated from noise-only frames 
15 detected by the VAD. 

When a processed frame is detected as noise only. VAD 25 signals a 
noise suppression circuit 26 to switch in a suppressor 27. In this mode, the noise- 
only input to filter 23 is attenuated substantially before its entry to the outgoing path 
19 to the far-end listener at terminal 28. Additionally, when a noise-only frame is 
20 detected, the VAD signals the update function 29 in filter 23 to make a new noise 
spectral estimate based on the current noise frames and to weight it with the previous 
noise spectral estimate. 

When speech is detected by the VAD. input to 26 is switched to 23 such 
that the filtered speech is passed to the outgoing line 19. In addition, the order of the 

25 LPC speech model for the iterated Wiener Filter in 23 is set at 10-th order if voiced 
speech is detected and at 4-th to 6-th order for an unvoiced speech frame. The 
motivation for this adaptive order of speech model is that the iterative search for the 
LPC poles can result in false formants in parts of the frequency band where the ratio 
of signal power spectrum to noise power spectrum is low. This results in noise tones 

30 of random frequency and duration in the filtered output that can be objectionable to 
the human ear, even though they are very low level relative to the average signal 
amplitude. Hence, since the LPC order typically needed for unvoiced speech is only 
half that of voiced speech for the bandwidth of interest, and since unvoiced speech is 
usually weaker than voiced speech, it is important to modulate the LPC order such 

35 that the speech model is not over specified. 
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The processes practiced in the iterative filter 23 are based on the 
available filter approach in the Lim/Oppenheim reference and on the interframe and 
intraframe smoothing applied by J. H. L. Hansen to improve the iterative 
convergence for his non-real-time AUTO-LSP Smoother discussed in the 
5 Hansen/Clements reference. Variations realized by the present invention added 
thereto. Filter 23 operates on an incoming noisy speech signal to obtain the 
approximate speech content The filter operation will now be described. 

SIGNAL-MODEL SMOOTHING ACROSS ADJACENT TIME FRAMES 
If the speech is not already in digital form, filter 21 contains an 

10 incoming signal analog-to-digital converter 30, which generates frame blocks of * 
sampled input Frame size of 160 samples, or 20 msec, is a time duration sufficient 
for speech to be approximated as a statistically stationary process for LPC modeling 
purposes. The iterated Wiener Filter and the LPC model of the speech process used 
as one component of this filter are based on a stationary process assumption. Hence, 

15 it is significant that the frames are processed in these short time blocks. 

Referring now to FIG. 2, the input signal plus noise may be expressed 
by y [n] = s[n] + <f[n], where y is the available input sample, and s and d are the 
signal and noise parts. The samples are blocked into frames which overlap 
substantially, for example, by 50%. The data blocks are each weighted by a time 

20 window, such as the Hanning window, so that the sum of the overlapped windowed 
frames correctly spaced in time will add to give the original input time series. The 
use of a window reduces the variance in the LPC model estimated for a data frame, 
and frame overlap provides a continuity in the reconstructed filtered signal output to 
19 in FIG. 1 A. 

25 As in the iterative AUTO-LSP smoother in the Hansen/Clements 

reference, there are two types of constraints for the present invention that are applied 
at each iteration of the Wiener Filter during the processing of the current frame of 
input data. These are the LPC Autocorrelation matrix relaxation constraint applied at 
each intraframe iteration of the current frame, and the interframe smoothing of the 

30 current frame's LPC speech model pole positions across the LPC pole positions 
realized at each iteration for adjacent past and future frames. The LPC pole 
constraints are not applied direcdy since these occur as complex numbers in the Z- 
plane, and the proper association to make of the complex pole positions for 
interframe smoothing is not clear. An indirect but simpler approach is possible by 

35 using an equivalent representation of the LPC poles called the Line Spectral Pair 
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(LSP), the details of which are discussed in the Hansen/Clements reference and in 
Digital Speech Processing, Synthesis, and Recognition, by S. Fururi, Marcel Dekker, 
Inc., New York, NY, 1989, Chapter V. The N-th order LPC model pole positions are 
equivalently represented by a set of N/2 LSP "position" roots and N/2 LSP 
5 'difference* roots that lie on the Unit Circle in the complex Z-plane. The utility of 
this equivalent LSP representation of the LPC poles is that lightly damped formant 
locations in the signal's LPC model spectrum are highly correlated with the LSP 
position roots, and the bandwidths of the LPC spectrum at these formants are highly 
correlated with the LSP difference roots. For a stable LPC model, the two kinds of 

10 LSP roots will lie exactly on the Unit Circle and will alternate around this circle. The 
ordering in position of LSP roots is obvious, and their smoothing across time frames 
is much simpler than in the smoothing of complex LPC roots. In summary, the LPC 
poles at each iteration of the current frame being filtered are smoothed across LPC 
poles at the same iteration in adjacent frames by smoothing the equivalent LSP 

15 position roots and by applying a lower bound on the minimum distance of a 
"difference" root to adjacent "position" root The latter bounding restrains the 
sharpness of any LPC model's formants to be speech like. 

The invention calls for performing the LSP position smoothing across 
nearby contiguous time frames, but in the filter implemented for real- time 

20 application in a communication network, only a few frames ahead of the current 
frame being filtered can be available. For 20 msec, frames with 50% overlap, the 
minimum delay imposed by using two future frames as indicated in FIG. 2 is 30 
msec. Even this small delay may be significant in some communication networks. 
The filter discussed here assumes four past frames and two future frames for 

25 smoothing. Although the entire past frames are available, only those correlated with 
the current frame should be used. 

ITERATION PROCESS 

The constrained iterative steps performed for the current frame K are 
shown in FIG. 3 with the iteration 1,.. M J details indicated in FIG. 4. The Wiener 

30 Filter-LSP cycle is initiated by filtering the input block y[n] in the frequency 
domain, by the Wiener Filter (WF) where the signal and noise power spectral 
estimates used are C * S y (/) and S d (f). That is, the initial filter's signal spectrum 
is the total input spectrum scaled by C to have the expected power of the signal: 
P signal =P total ~P noise* After initialization, the loop in FIG. 3 performs of the 

35 following steps for iterative filtering of frame K\ 
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(1) Start the iteration loop by estimating the LPC parameters of the WF 
output signal in the Time Domain where the LPC autocorrelation calculation is 
subject to a relaxation over autocorrelation values of previous iterations for the 
frame. This relaxation step attempts to further stabilize the iterative search for the 

5 best speech LPC model. This is discussed below in conjunction with FIG. 5. 

(2) From the LPC model found in (1) at iteration j for speech frame K, 
solve for the LSP position roots Pj and difference roots Q i . This requires the real- 
root solution of two polynomials each of one-half the LPC order. 

(3) Smooth the LSP position roots P j for the current frame K across 
10 adjacent frames as indicated in FIG, 2 and FIG 5c, and constrain the LSP difference 

roots Qj away from the smoothed Pj roots. Each difference root Qj is constrained 
to be more than a minimum distance D^n away from its closest smoothed Pj tool 
This prevents the smoothed LPC pole positions from being driven to the Unit Circle 
of the complex Z-plane. This "divergence" was a problem in the Lim-Oppenheim 
15 iterative filter of the Lim/Oppenheim reference that was addressed in the smoother in 
the Hansen/Clements reference. The constraint is desirable for realistic speech 
transmission. The value = 0.086 radians has been used in telecommunications 
tests of the method. 

(4) Convert the smoothed LSP roots to smoothed LPC parameter, 

20 compute the LPC signal model power spectrum S s (/),- scaled such that the average 
power equals the current Kjh frame estimated signal power: P signal = P total -P noise* 

(5) Use the smoothed LPC model signal spectrum S s (f)j and the 
current noise power spectrum estimate S d (/) to construct the next iteration's 
Wiener Filter Hj (/) as shown in FIG. 3 and FIG. 4. We use the term Wiener Filter 

25 loosely here since this filter is the usual non-casual WF raised to a power pow. 
Values for pow between 0.6 and 1.0 have been used in telecommunications tests of 
the method. The larger pow is the greater the change that occurs with each iteration, 
but with smaller pow the iterative search for the signal component should be more 
stable. 

30 (6) Filter a combination of the previous iterations WF time-series 

output sj- 1 [n] and the original input data y [n] with the current Hj (/) to get the 
next iteration of signal estimate Sj[n]. The linear combination used is 
(1 -B).y[n)+B. sj-i [n], where 0<B < 1. If 5=0, the filter becomes an 
unconstrained Lim-Oppenheim iterative filter, and if B = 1 the input to the next WF 

35 is the previous WF output as done in the Hansen AUTO-LSP smoother in 

Hansen/Clements reference. Values of B between 0.80 and 0.95 have been used in 
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most of the experiments on this filter. With these values of B, some desirable 
features of both the Lim-Oppenheim filter and Hansen smoother were combined. 
This weighting concept is new in the present method. It gives additional control of 
the amount of final noise content vs. the degree of high-frequency filtering observed 
5 in the iterated filtered speech. 

The combining of features of the two previous signal-modeled iterative 
algorithms in the Lim/Oppenheim and Hansen/Clements references, specifically the 
weighted combination of Wiener Filter inputs each iteration, has been found 
subjectively to result in a less muffled sounding speech estimate, with a trade-off of 
10 slightly increased residual noise in the output Combining is shown in FIGS. 2 and 
3, where it is seen that the input signal to the FILTER at the j_th iteration is the 
TOTAL INPUT y[n] and the Wiener Filter OUTPUT j[n] y _ i from the (j-l)_th 
iteration. 

(7) In the present implementation of the method the number of 

15 iterations intra is an input parameter determined by experiment For the results 
obtained in experiments, a value of 4 to 7 intraframe iterations were used in 
combinations [Intra, pow] such as [7, 0.65], [5, 0.8], and [4, 1.0] where values of the 
feedback factor B were between 0.80 and 0.95. The best values depend on the noise 
class and speech type. For broad band flat noise, intra =6 may be typical while only 

20 4 or 5 iterations may suffice when the noise power spectrum is heavily biased below 
1 KHz of the [0, 4 KHz] voice-band spectrum. 

An important aspect of the invention that is illustrated in FIG. 1 A, item 
25, and also in FIG. 3 is the multiple application of a Voice Activity Detector 
(VAD). to both detect noise-only frames and to determine the best model order to 

25 apply in each frame by detecting voice or unvoiced speech if speech is present As 
noted before, the best order for a LPC speech model differs for voiced and unvoiced 
speech frames. Also, as noted earlier, the noise spectrum is updated only when no 
voice signal is detected in a sufficient number of contiguous frames. During a time 
interval when noise only is detected, noise suppressor 27 in switch 26 is activated to 

30 attenuate the outgoing signal, and the iterative filter 23 is then inactive. If, however, 
speech is detected, then 26 switches 30 to the output 19. And the class of speech, 
voiced or unvoiced, conditions the order of the LPC speech model to be used in the 
iterations. Also, the detection of change between the three possible states, noise- 
frame voiced-frame and unvoiced-frame, causes the LSP history for past frames 

35 K -4,Jif-3,AT-2,andJir- 1 to be reinitialized before application of smoothing to the 
current K_th frame. This is both necessary and logical for best speech filtering since 
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the purpose of smoothing across past time frames is to average disparate noise by 
making use of the short term stationary of speech across the frames averaged. 

FRAME PROCESSING 

The method of processing the frames to achieve real-time operation of 
5 filter 23 is shown in FIG. 6b, The K_th frame is assumed to be the present time 
reference point with frames K-4,K-3,K-2,K- 1 the previously processed and 
archived frames while frames AT + 1 and AT +2 are the available future frames. As in 
the smoothing approach in the Hansen/Clements reference, filter 23 smoothes the 
LSP roots of the Kjth frame speech model with those of the past and future frames 

10 at each Kjh frame iteration by using the past frame LSP histories at the iteration 
number in process. However, unlike the non-real-time smoother in 
Hansen/Clements reference, the invention uses only two future frames and also 
stores the required past-frame LSP distories during the iterations done for each frame 
so that it accumulates these histories for the previous four frames to be smoothed 

15 with the current frame during the intraframe iterations. As in the method of 

Hansen/Clements reference, the weights are tapered across the frames and the taper 
from each LSP foot depends on the current frames SNR as well as the SNR history 
up to this Kjh frame. 

Another improvement in the invention is the use of table lookup for the 

20 frame LSP weights to be applied across frames. Weight tables applied in the 
invention arc of the type shown in FIG. 7, whereas the weights required in 
Hansen/Clements reference are obtained by time-consuming formula computations. 
The values applied in the tables in FIG. 7 can be easily and independently adjusted, 
unlike the constraints imposed by the formula used in Hansen/Clements reference. 

25 The speech-frame thresholds at which a weight vector are applied to a particular LSP 
root is switched from one table to another are selected independently. The general 
strategy in constructing smoothing vectors is to apply more smoothing to the higher 
order LSP positions (Le. higher formant frequencies) as indicated reading left to 
right in these tables. This is due to the greater influence of noise at given SNR 

30 observed on the higher order LSP speech positions. Another trend imposed on the 
table values is that smoothing is broad and uniform when the frame SNR is low and 
is decreased as SNR is increased to the point where no smoothing is applied at high 
SNR. This trend is due to the decreasing effect of noise on the filtered speech as 
frame SNR is improved. The frame SNR thresholds used to switch from one table of 

35 weight vectors to another are presently selected as multiples of the running estimate 
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Npaw of the noise power estimated in the VAD. The increasing thresholds used are 
Thl = 2.Npow for change from table Winl to Win2, Th2 = 3 Apow from table Win2 
to Win3, Th3 = 7J^pow from table Win 3 to Win4, Th4 = 1 LNpow from table Win4 
to Win5, with WinO imposed if a sufficiently long run of low SNR frames occurs. 

5 USE OF VOICE ACTIVITY DETECTION 

Estimating the noise power spectral density S d (f) from noise-only 
frames using a voice activity detector (VAD), in accordance with the invention, 
provides an advantage. The filter process outlined in FIG. 3 is based on the 
assumption that the noise present during speech has the same average power 

10 spectrum as the estimated S d (f). If the noise is statistically wide-sense stationary, 
noise estimates would not need to be updated. However, for the speech enhancement 
applications illustrated herein, and also for many other transmitted noise reduction 
applications, the noise energy is only approximately stationary. In these cases, a 
running estimate of S d (/) is needed. Accordingly, a VAD such as detector 25 in 

15 FIG. 1 A, having good immunity to noise at the operating SNR is used to identify 
when speech is not present Noise-only frames detected between speech segments 
are used to update the noise power spectrum estimate, as shown in FIG. 10. One 
suitable VAD for use in the FIG. 1A application is obtained from the GSM 06.32 
VAD Standard discussed in "The Voice Activity Detector for the PAN-EUROPEAN 

20 Digital Cellular Mobile Telephone Service," by D. K. Freeman et al., in IEEE Conf. 
ICASSP. 1989, Setion S7.6, pp. 369-372. 

The pre-filtered and post-filtered speech examples shown in of FIGS. 8 
and 9 indicate how voice activity detection is used to trigger attenuation of the 
outgoing signal when no voice is detected. As discussed in the Freeman et al. 

25 reference, the activation of the VAD on a noise frame is a convoluted balance of 
detected input level and repeated frame decisions of "no speech" properties. 

IMPROVED OUTPUT USING SPEECH CLASSIFIER 

Advantageously, a VAD speech classifier decision may be incorporated 
in the front end of the LPC model step as shown in FIG. 3. This is because the 
30 parameter settings such as LPC order in the AUTO_LSP algorithm are best adjusted 
according to the speech class (voiced or unvoiced) which is being filtered in the 
currently processed frame. If the speech within the processed frame can be classified 
reliably in the presence of noise, the enhancement may be improved. 
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NOISE SPECTRUM ESTIMATION 

In accordance with another aspect of the invention, and referring to FIG. 
3 and FIG. 10, an improved sensitivity to changes in the noise signal spectra is 
provided by apparatus which updates spectrum S d (f) with new "noise-only" frames 
5 to a degree that depends on how different the new noise spectra estimate S d (f) new 
is from the prior estimate S d (f). US d (f) L ^ x denotes the prior noise spectrum, the 
updated spectrum is 

^(/)l=(1-^).^(/)l-i+/1.5 </ (/) #iw 

where 0<A<1 is a normalized average of the error I \ p over 

10 the frequency band. Typical values forp are 1 -» 2. When a new noise spectrum 
estimate is "near" the prior estimate shape, A is near 0. but when the two spectral 
shapes are very different, A will be nearer 1 and the new noise frames will be heavily 
weighted in S d (f) L . Noise-frame decisions are made by the VAD which is a 
relatively conservative estimator in the proper SNR range, hence the probability of 
15 correct noise decisions is high for SNR above 10 dB. The time between noise 
updates is not a parameter in this approach, only average spectral difference. In 
order to decrease the variance in estimating the spectrum S^f)^ it is desirable to 
require a number of contiguous noise-frame decisions from the VAD before and 
update is valid. In test of the enhancement, 5 or 6 contiguous noise-frames are 
20 required in order to update the spectrum. 

ADDITIONAL COMMENTS ON THE AUTO-LSP IMPROVED 
ITERATIVE FILTER 

As discussed previously, two types of constraints are used in the 
AUTO-LSP filter approach to improve the Lim-Oppenheim model-based iterative 

25 filter. These are the intraframe autocorrelation relaxation placed on the 

autocorrelation matrix which is computed for the LPC model each iteration, and the 
interframe smoothing over LSP roots that occurred in the iteration for the time 
frames around the frame being filtered. The constraint operations, performed each 
iteration, are shown in in FIG. 5. The Smoothing Operation shows the order in 

30 which the constraints are to be applied during an iteration to obtain that iteration's 
Wiener Filter (WF) signal power estimate S s from the previous iteration signal 
result s[n] y. v The iterative sequence of filtering the whole Signal + Noise y[n) 
with the WF where at each iteration the new estimate of the signals spectrum is 
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inserted into the WF model will, in theory, converge to the "best" signal estimate 
under the statistical assumptions imposed in the Lim/Oppenheim reference. In the 
real-world speech signal and noise classes of interest, the additional AUTO-LSP 
intraframe and interframe constraints assist the convergence and impose speech-like 
5 requirements on the signal spectrum in the WF. The intraframe autocorrelation 
relaxation is shown in Part B of FIG. 5. where the desired LPC model parameters are 
denoted as a, the autocorrelation matrix of the latest signal estimate s[n]j is R,, and 
b, is the cross-correlation vector in the Yule-Walker AR method. The proposed 
relaxation factor is c =0.7. The relaxation can be expanded to smooth over more 

10 than just the previous frame, but no significant advantage has been observed in doing 
this. The smoothing process is shown in FIG. 5C. Each large circle indicates the 
Unit Circle in the complex Z-plane. For the K_th frame and iteration j, the symbol 
*o* marks the LSP difference roots Q Kj and *** marks the position roots F KJ . For a 
LPC model that is Minimum Phase, the poles he inside the Unit Circle and the P*, 

15 and Q KJ will alternate along this circle. LSP smoothing is over the past and future 
frames, where the present set is K-4,K-3,K-2,K- l,K,K+l,K+2. Only the 
position roots P KJ sat smoothed directly, and the difference roots are forced to 
track the smoothed P Kj . An inverse step gives the smoothed, scaled LPC signal 
model's spectrum S t (f)j. The complex roots of an equivalent LSP representation 

20 are simply the solution of a pair of real-root polynomials each with half the order of 
the original LPC polynomial, as is fully described in the Hansen/Clements and Furui 
references. 

A clear computational advantage exists in smoothing LSP roots in the 
AUTO-LSP approach rather than directly smoothing the complex domain roots of 

25 the LPC autoregressive models. Even though the LPC and LSP model 

representations are equivalent, a possible disadvantage of smoothing LSP roots 
across frames is that a nonlinear relationship exists between the LPC spectrum 
formant locations/bandwidths and the corresponding LSP position/distance roots. 
Specifically, as LPC roots move away from the Unit Circle, LSP position roots do 

30 not identify well with the LPC formant frequencies or bandwidths. However, this 
nonlinear mapping does not seem to limit the effectiveness of constrained LSP roots 
in providing improved speech enhancement 

The described process is particularly effective when the noise is 
statistically wide-sense stationary during the time interval from the point of 

35 estimation of the noise power spectrum to the end of the Speech+Noise processed 
using this noise estimate. It seems to be most effective for signal-to-noise ratios 
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above lOdB SNR. For interference cases such as automobile road noise and aircraft 
cockpit noise where much of the spectral energy is at the lower part of the audio 
band, it may function usefully down to 5dB SNR. For stationary tone-like noise 
such as In-Network hum, the filter has been operated with considerable success for 
5 SNRs below 0 dB when the VAD gives clear indication of the noise-only frames. 
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Claims: 

1. In a telecommunications network comprising a switching node, 
incoming transmission channels connecting to said node and carrying transmissions 
comprising signals and noise from remote locations, and outgoing signal 
5 transmission channels, a process for filtering noise from said incoming 
transmissions, comprising the steps of: 

converting said incoming transmissions to create an enhanced speech 
signal into consecutive overlapped and time-windowed information frames, each 
frame comprising digital samples taken at a rate sufficient to represent the incoming 
10 signal by a linear predictive coding (LPC) speech model; 

storing each said frame in a memory of a signal filter, said filter 
including means for performing iterative estimations on said LPC speech model; 
making plural intraframe iterations of the present frame by: 

making in said signal filter an initial estimate of the speech signal 
15 component for the present frame based upon the total input signal spectrum and a 
current estimate of the noise spectrum; 

generating from said initial estimate a set of equivalent LSP position 
roots for said present frame; 

for each intraframe iteration of each said present frame, smoothing 
20 said position roots of said present frame with the position roots saved from 

corresponding ones of past frames iterations and plural LSP position roots obtained 
from the first said iterations on plural future frames; and 

repeating the intraframe iteration steps a selected number of times; 
the output of the final said iteration comprising a filtered frame of a real 
25 time estimate of an incoming speech signal. 

2* The process in accordance with claim 1, wherein said selected past 
frames consist of up to four of the most recent frames; and the selected said future 
frames consist of the nearest two. 

3. The process in accordance with claim 2, comprising the further steps 

30 of: 

distinguishing between frames with noise-only content versus frames 
having speech content; 
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generating a continuous estimate of the noise spectrum using content of 
said noise-only frames; and 

in response to detecting a noise-only frame, updating said noise- 
spectrum estimate. 



5 4. The process in accordance with claim 3, comprising the further step of 

disconnecting said filter output from said outgoing transmission channel in response 
to detecting a noise-only frame; and shunting said incoming transmissions through 
an attenuator and thence directly to said outgoing transmission channel. 

S. The process in accordance with claim 4, comprising the further steps 

10 of: 

detecting for each said speech frame whether speech is voiced or 

unvoiced; 

in response to detecting a said speech frame, setting the order of the said 
speech model to the 10 th order LPC; and 

15 response to detecting a said unvoiced speech frame setting said order 

significantly lower than said 10 th order. 



6. The process in accordance with claim 5, wherein said order setting in 
response to detecting of a said unvoiced speech frame is in a range between the 
fourth order to the sixth order. 

20 7 - The process in accordance with claim 6, wherein said current estimate 

of the present noise frame is derived by a process comprising the steps of: 

determining how many consecutive frames of noise-only are currently 
stored in said filter; 

if the number of said consecutive frames is above a predetermined 
25 amount, and calculating the average noise power spectrum of said consecutive 
frames; 

measuring the difference between said average noise power spectrum 
and the previously calculated noise power spectrum; and 

adjusting each of the last two-named spectrum by weighting factors 
30 related to said difference measure, said adjusting forcing the resulting sum of said 
spectrum to conform to a predetermined power spectrum level. 
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8. The process in accordance with claim 7, comprising the further steps 
of setting a transmitted incoming noise threshold and determining noise above said 
threshold is present; 

determining whether the incoming call includes voice signal content; 
5 determining whether the originating number is that of a customer of a 

telecommunications service providing reduced transmitted noise energy; and 

if all said last-named predeterminations are present, activating said 
process at said switching node. 

9. The process in accordance with claim 8, further comprising the step of 
10 applying weighting to said LSP position root values in each frame, wherein said 

weighting is defined by selectively combining the LSP formant number, the value of 
the total frame power, the frame power threshold, the consecutive noise-threshold 
misses and whether a said count threshold L max is exceeded by P counl . 

10. The process in accordance with claim 9, wherein the number of 

15 intraframe iterations made upon each said present frame is between one and seven. 

11. The process in accordance with claim 10, comprising the further 
steps of repeating the said intraframe iteration process upon each successive frame; 
and combining the time-overlapped frame results to create a said output. 
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